SPICKER: A clustering approach to identify near-native protein folds

نویسندگان

  • Yang Zhang
  • Jeffrey Skolnick
چکیده

We have developed SPICKER, a simple and efficient strategy to identify near-native folds by clustering protein structures generated during computer simulations. In general, the most populated clusters tend to be closer to the native conformation than the lowest energy structures. To assess the generality of the approach, we applied SPICKER to 1489 representative benchmark proteins </=200 residues that cover the PDB at the level of 35% sequence identity; each contains up to 280,000 structure decoys generated using the recently developed TASSER (Threading ASSembly Refinement) algorithm. The best of the top five identified folds has a root-mean-square deviation from native (RMSD) in the top 1.4% of all decoys. For 78% of the proteins, the difference in RMSD from native to the identified models and RMSD from native to the absolutely best individual decoy is below 1 A; the majority belong to the targets with converged conformational distributions. Although native fold identification from divergent decoy structures remains a challenge, our overall results show significant improvement over our previous clustering algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identify High-Quality Protein Structural Models by Enhanced K-Means

Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorit...

متن کامل

Entropy-accelerated exact clustering of protein decoys

MOTIVATION Clustering is commonly used to identify the best decoy among many generated in protein structure prediction when using energy alone is insufficient. Calculation of the pairwise distance matrix for a large decoy set is computationally expensive. Typically, only a reduced set of decoys using energy filtering is subjected to clustering analysis. A fast clustering method for a large deco...

متن کامل

Combining inference from evolution and geometric probability in protein structure evaluation.

Starting from the hypothesis that evolutionarily important residues form a spatially limited cluster in a protein's native fold, we discuss the possibility of detecting a non-native structure based on the absence of such clustering. The relevant residues are determined using the Evolutionary Trace method. We propose a quantity to measure clustering of the selected residues on the structure and ...

متن کامل

Finding the needle in a haystack: educing native folds from ambiguous ab initio protein structure predictions

Current ab initio structure-prediction methods are sometimes able to generate families of folds, one of which is native, but are unable to single out the native one due to imperfections in the folding potentials and an inability to conduct thorough explorations of the conformational space. To address this issue, here we describe a method for the detection of statistically significant folds from...

متن کامل

Energy functions that discriminate X-ray and near native folds from well-constructed decoys.

This study generates ensembles of decoy or test structures for eight small proteins with a variety of different folds. Between 35,000 and 200,000 decoys were generated for each protein using our four-state off-lattice model together with a novel relaxation method. These give compact self-avoiding conformations each constrained to have native secondary structure. Ensembles of these decoy conform...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of computational chemistry

دوره 25 6  شماره 

صفحات  -

تاریخ انتشار 2004